DOCUMENT RESOMfi 



ED 242 718 SP 024 155 

AUTHOR Stiggins^ Richard J.? Bridgeford, Nancy J • 

TITLE The Use of Performance Aggessment in the 

Classroom. 

INSTITUTION Northwest Regional Educational £ab. , Portland, QR. 

Center for Performance Assessment. 

SPONS AGENCY National Inst, of Education (ED), Washington, DC. 

PyB_DATE Jan 84 

CONTRACT 400-83-0005 

NOTE 45p. 

PUB TYPE Reports - Research/Techni dsi (143) 



EDRS PRICE ^fFel/ped2 Plus Postages 

DESCRIPTORS Academic Achievement ; *Adbi^vement Tests ; Classroom 

Techniques; *Educatibhal Testing; Elementary 
Secondary Education ; InJc?rm^l_ Assessment ; _ 
Standardized Tests; *Stii3eht Evaluation; *Teacher 
Attitudes; *Teacher Hade Tests; Test Reliability 



ABSTRACT __ 

A study explored the nature and quality of 
teacher-developed assessment instrumGntp, Teachers (n=228) from a 
range of grades, subjects, and school districts described patterns of 

test use, cbncerifls about assessment, and us^ of performance 

assessment by completing an extensive questionnaire . The research was 
conducted to determine: (1) teachers' jSKiliSr attitudes, perceptions, 
and concerns about day-to-day classroom^ss^ssment; (2) the e to 
which performance tests (versus other f^xrms of assessment) are used 
in classrooms; (3) the nature of performance tests; and (4) whether 
(or how) teachers check and/or attempt to improve the quality of 
their classroom performance assessment^ , Results suggested that the 
foundation and structure of classroom ^^^es^meht consists primarily 
of teacher-developed assessments J, with pertorinahce. assessment serving 
as one of the key tools. Five major issues are analyzed and 
discussed: (1) the use and importance d performance assessment in 
the classroom; (2) the stability of results across grades subjects ^ 
and research contexts; (3) teachers' concettis about assessment, 
particularly withrespect to improving test quality and use; (4) 
specific issues of assessmerit quality , inclliding potential 
difficulties in classroom performance assessment procedures ; and (5) 
actions heeded to overcome some of the a^s^ssment problems. 
(Authbr/JD) 



******************************* ***********^***** 

* Reproductions supplied by EDRS are th§ best that can be made * 

* from the original document. * 

*********************************************************************** 



EKLC 



SCO_PE OF iNTEflEST NOTICE 

th's (locumtjm fof proccosing 

to: 

In Qur. iuaomeoL lhjS.dDcunne.o.i. 
IS also oi iniorfist io the C\cotiOg~ 
houses, noted jn.tfie. now... index- 

-„^^ in{j should rofcct their special 

points o' view 

rxj 



THE USE OF PERiNDRMANCE ASSESSMENT IN THE CU^gS^O(?M* 



Richard J. Stiggins 
and 

Nancy 5. Bridgeford 



U:s: DEPARtMENt OF CDUCAtlON 

NATIONAL IfyjStjtUtE OF iEbUCATiOrJ 
EDOrATjONAL RESOURCES iNFORMATtON 

y CENtiER iiERiC) 

^/^This document has been reproduced as 

received from the person or brganiraiibn 

Qpginating it. . 

Minor, changes have been rridde to improve 
reproduction quality. 

• Points of wievv Or Opinions stated in this docu- 
ment do not necessarily represent oHtciai N jjE 
position Of policy. 



^PERMlSSiON TO REPRQOUC^THjs 
MATERIAL HAS BEEN GBaNT^P 




TO- THE EbUCATiONAL RESatjBG^^ 
INFORMATION CENTER I^RiC).'' 



Center for Performance Assessment 
Northwest Regional Educational LabdratOiry 
300 S. W. sixth Avenue 
Portland, Oregon 97204 

January, 1984 



*This research was conducted under contract #400-83-06^5 with the 
National Institute of Education. Opinions expressed in tnis publication 
do riot necessarily reflect the position of NIE, and no official 
endorsement should be inferred. 



Research on ola^^tao^ assessment has tended to focus oh standardized 
tests and has paid minijflal attention to teacher-developed assessments. 
As a result. We have 2i n^trow understanding of the ciassro6it\ assessment 
environment. This ^tudy was designed to broaden that und^rstahaihg by 
exploring the nature ^t^6 quality of teacher --developed assessments 
Particular emphasis va^ given to understanding the role of performance 
assessment — observation ^nd rating of behavior — in the classj^o^iHi 
teachers from a rarlfe o$ grades, subjects and school district^ described 
their patterns of te^t Us^r concerns about assessment and use of 
performance assessmeiit by completing an extensive questionnaite. When 
responses are summarised Across teachers, the results suggest tHat the 
foundation and structure of classroom assessment consists primarily of 
teacher -developed as^^ssttionts r with perfbrtnance assessment Setting as one 
of the key assessment to^^ls. Teachers are concerned about ^s^essment and 
know that improvement rflay be heeded , but will need help to af^ec^t needed 
changes. Action plattS at^ suggested for enhancing the quality of 
teacher -developed te^ts* 



3 



INTRiDbueTibN 



Teachers use many a^^^S^ittent ittethods to track student growth and 
development. They device m^ny of those assessments themselves. Some are 
paper and pencil measure^. Others ate based on behavioral observations. 
This research explores teacllers' observation and rating of student 
behavior and products ^5 these measutes relate to the larger context of 
day-to-day classrckjm as^es^jnent . Specif ically, the research was 
conducted to determine: teachers' skills, attitudes, perceptions and 

concerns about day-to-^^y cUssrbom assessment; (b) the extent to which 
performance tests (behavioral observations and ratings) —versus other 
forms of assessment— are in classxrbbms; (c) the nature of classroom 

perforitiance tests (ase^. ixercises, responses and performance rating 
procedures) ? and (d) whether (or how) teachers check and/or attempt to 
improve the quality of tyieU classroom performance assessments. 

To date, the measuteSient community has tended to limit its study of 
testing in the schools t^ the role of large-'scale standardized testing 
programs. Par less attention has been given to the nature or quality of 
teacher-'developed clas^rc^oSt assessments. And almost no attention has 
been given to the nature or Quality of observational assessment methods 
like performance assessment- To illustrate, nearly all major recent 
studies of teachers' testing practices and attitudes have focused on the 
role of standardized t^st^ in the education process (Goslin, 1967; 
tortie, 1975; Airasian, et al.r 1977; Stetz and Beck, 1979; Rudman, et 
al-, 198D; Salmbh-Cox, 198l' Sproul and Zubrow, 1981; and Kellaghan, 
Madaus and Airasian, 1S82) . Further, a recent special issue of the 



^i^^^ ^of, idQcational Measurement (Burst^ih. 1983) on the state of the 
iM iinKing testing and ihstructibh is ihtjrbduced as follows: 

Linking testing and instructioh ^ fundamental and 
enduring concern in educational ^ir^Ct ice. . .Fundamental 
questions about how well achiev^itteht test items reflect 
both student knowledge and the C(Pnt^ht. bf^ instruction 
clearly at the heart of the mattef , * • The 
contributors [to this special is^u^J were asked to limit 
their conception of achievement te^^ing tb include 
standardised achievement tests, ^utriculum embedded or 
locally developed domain-refer ^nc:e<3 and prbficiency 
tests, and state assessments. J^j^gl^, teacher made 
tests*. > were sys teroa^i^cally exclacL^j * (p. ^9, emphasis 
added) 

thu^^ tnis "state of the art" review linRing testing and instruction was 
con^tt^ined tb the kind of test information Obtained from instruments 
developed outside the classroom—measures providing only a portion of the 
d^ta teachers use tb integrate testing and instruction. 

rThi^ emphasis bn large-scale and standat^ised tests on the part bf 
mea^ut^rnent researchers may result from ^he strong tradition bf 
sQiantific inquiry in educational reseatch ^nd psychometric mbdels in 
educational measurement (Cbffman, 1983? Calf^e and brum, 1976). These 
emphas^^ lead tb admbnitibns in our measufem^nt textbooks that teachers 
shouw strive tb gather ''hard data" on Student achievement by relying on 
o^ih^ only objective tests. Yet, several researchers conclude from their 
studies of testing in the schools that tea^h^rs purposefully go beyond 
t^st scores arid are intent on using observation-based modes bf assessment 
to acquire information fbt decisibri making, for example, in a national 
study b£ classroom assessment, Herman an^ Dotc-Bremme (1982) report, 
-neatly every survey respbriderit reported tuat *Sy own observatibns arid 
itudent^' classworic' was a crucial or impoirt^nt source of informatibn. - 



beeie 



2 



IH another stucly, Salition-Cox (1981) concludes, "bvervh^lmingl? r we 
found that teachers, vhe^n talking of how they assess their st:U(3ehtsr most 
frequently meritibn •ob^^er vat ion ' . Clearly this favored te^chef technique 
is quite different ftrbiti the kind of information provided by ^t^ncJardi^ed 
tests." 

And Kellaghan, and Aitasian (1982) point out •'St;atid^rc3ized 

test information applets to represent an aoxiiiary or secohd^iry criterion 
in [instructibhal] jud^mentr since teachers were nearly unanimous in 
stating that the most comnidnly reported grouping criteria Wet^ the 
teachers* own bbsetv^tiOjls and tests." 

In fact, the H^rrn^h and Dor r-Bremir.e study, along with cB^t of Veh 
(1978)/ are among the few investigations of testing in the schools to go 
beyond the role of SCan^^rdized tests and focus on teacher-developed 
tests. Their national survey results suggest that, depending on grade 
level, a third to thf ee-quar ters of tests used in the classroom are 
teacher developed, 1^ understand the implications of the omtssion of 
teacher-developed t^^t^ from prominent research on measut^ment ptac^tice, 
we must consider th^ difEerences between the views of the test specialist 
concerned with scientific measurement and the measurenient n^eds of the 
classroom teacher. C^ffman (1983) provides a concise aMiysis of these 
differences by refetfih9 to earlier comments by Scates (1943): 

Scates pointed but that the scientist is interested in 
truth iea^in^ to broad generalizations while the teacher 

seeks infotin^tion of direct practical value; the 

scientist interested in elements whereas the teacher 
is interested in functioning organisms; the ttieasurem^nt 
•specialist cannot measure continuously r but the teacher 
needs to an3 must measure continuously; the scientist 
measures ttait^ oniform throughout their range, bat the 



086ie 



3 

S 



teacher ineasute^ growth in stages; arid the measurement 
specialist generally measures formal abilities by 
cross-section pow^f tests > but the teacher must be 
concerned with ^^havibral dynamics in lifis situatibris. 
To the extent th^t Scates' analysis is sburid, it is hot 
surprising th^t there is little systematic study of 
teachers' terting practices reported in the literature 
written Primatijy by researchers arid test specialists. 

If measurement researchers cbritinue to emphasize only those 
tests that serve large-^t^aie assessment purposes, we may fail to 
serve teachers' primary ifle^isurement needs. Measuremerit training 
that relies on traditional objective tests does riot meet all of 
the day-to-day assessment needs of teachers. It disregards the 
full range of measurement options available to teachers arid, most 
important, it fails to txelp teachers produce data needed to 
address. the day-to-day decisions they face. The research reported 
here is designed to broaden bur understanding of teachers' 
day-tb-^ay assessment n<2e<3S. 

the Types bf m^ssr^^offl- A^^^ g ^ent Exp loxad 

brie goal of this te^e^irch was tb determine the role and relative 
importarice of four typ^^ of measurement in the classroom: the teachers' 
own objective tests, |)Ul:>liSlied testsp structured performance assessments 
and sporitarieous perfotnt^nce assessments. 

The teachers' own b£3j^btive tests were defined to include tnose 
itiultiple choice, true/f^l^e, matching and short answer fill-in tests 
teachers desigri for use bh a day-tb-day basis in their classrooms. 
Published tests were defined tb include bbth standardized objective 
achievement tests and bbj^ctive tests supplied as part of published text 
ma ter ials . 



D861e 




Performance assessment, as defined for the purpose of this research^ 
is testing to be sare.^ but not in the traditional sishsis of objective 
test. Rather, performance assessment calls for the bbservatibh arid 
tatirig of student behavior, and necessitates that students actually 
demonstrate proficiency (Stiggins, in press)* 

performance tests have several important characteristics: First, 
students are called upon to apply the skills and knowledge they have 
learned. Second, performance assessment involves completion of a 
specified task (or tasks) in the context of real or simulated assessment 
exercises. Third, the assessment task or product completed by the 
examinee is observed arid rated with respect to specified criteria, in 
accordance with specified procedures. 

In the research reported here, we make an important distinction with 
respect to perfbrmarice assessmerit. We distinguish between structured and 
spontaneous perfbrmance assessmerits. The former is planned and 
systematically desigried to iriclude prespecified purposes, exercises, 
observations and scoring prbcedures. The latter arises spontaneously 
from the naturally-occurr irig classroom environment and leads the teacher 
to 3 judgment about an individual studerit's level of development. 

In this papers we suiranarize results from c4 large-scale survey of 
teachers* uses of these varibus testing methods, their concerns abbot 
assessment and the specific characteristics of their performance 
assessmerits . 



686ie 



5 

8 



RESEftRCH METHODOLOGY 

tha study was designed to probe assessment practices in a stratified 
sarhpl^ of teachers selected from eight districts across the country, varying 
in ^izc: ^rid geographic location. Five districts were urban, three suburban; 
thre^ wete in the East, two in the Northwest, and three in the West. Each 
di^ttict was to recruit 48 volunteer teachers to complete a comprehensive 
questionnaire on classroom assessment. Twelve teachers were to be recruited 
from ^ach of four grades (2, 5, 8 and 11). Of those 12 teachers at each 
qrae^ level, three were to describe their assessment methods in writing, 
tilted in speaking^ three in science and three in math. Thus, each 
respondent described assessment methods in only one subject, and at only one 
grad^ ievei . 

districts responded with cbinpletea surveys; however, the number of 
comploted forms differed substantially across districts, ft total of 228 
cornpi^tid questionnaires were received. The respondents were distributed 
almost equally across districts, grades and subjects. 

Although 228 responses represented less than dUr desired sample of 384, 
th^ gr^up was sufficiently large to proceed with the analysis. In analyzing 
and sut^s^quent interpreting of data^ however, we proceeded with caution for 
two sr^^gons. First, the sample size precluded r.n analysis of teachers by 
sabj^ot area within each grade level—Sth graJe science teachers, for 
^xmpXe. Analyses of the responses were 3 xmited to grade, subject and 
di^tl:ic?t totals. Second, generalizations beyond the volunteer sample were 
not: attempted. 



6 



Tae QQestionnaire 

Qaestionnaire ,Pesi^l . The questiohhaire was designed in s^^eral 
stepsi First, questions were devised to tap various levels o£ concern 
about and use of the b^sic types of classroom assessmient, ^iie initial 
version of the questionnaire served as the basis for structured 
interviews with te^che^sr during which the question underwent ejctensive 
revision (Stiggin^ & Bridgefbrd, 1982). It was then revieve<? ^nd 
critiqued by numerous tiachers> educational researchers, and editors 
through a long series of revisions and refinements. As a ^irial step, the 
questionnaire was field tested with 30 teachers from several gtades and 
subjects . 

To ensure that teachers understood the meaning cf each type of 
assessment covered in the questionnaire, they were provided With concise 
definitions of te^cher-jtiaiae objective tests, published test^^ structured 
perforinarice tests and ^pontanebuis performance assessments the 
beginning of the (Questionnaire. In each case, the teacher was asked to 
supply an example of each kind of test from his or her ejcpeiri^nce. If 
the example reveaJ^ed that the teacher ^id not understand the definitions 
of and distinctions between assessment types, that teacher *s tesponses 
were not included in the analysis. A small number of booklets from each 
district (usually 2 or 3) were eliminated for this reason. 

Levels of Use > Qne major set of questions probed teachers* use of 
four specific assessment options. Teachers were asked to describe the 
importance of different test options as a function of their specific 
reasons for testing; that is, for diagnosis, grouping, grading r 



0fi6le 



evaluating instruction, and reporting achievemert results, g^sponden 



were given these instructions; 

Describe the relative importance of each type of assessment by 
indicating the weight you give to each in achieving your various 
classroom assessment purposes. Each question balbw identifies a 
specific instructional purpose. If a certain type of asseS^m^nt 
carries no weight in achieving a given purpose^ you should enter 
0% next to it. On the other hand, if you tely compldtely on one 
type of assessment for a specific purpbse> you should enter 100? 
next to that type. As another example^ a response of 25% to 
each of the four indicates equal weight to each in achieving ^ 
that purpose. Percentages for each purpose should total 100. 

Second, to determine the extent to which each assessment option 
Was used> we used ah adaptation of a scaling system developed by the 
University of Texas Research and Development Center in Teacher 
Education (Hall, et al., 1979) to pinpoint teachers* levels of use 
of the four alternative assessment methods as Indicated in the 
foiibwing scale: 

NONUSE ; No action is being currently taken or anticipated 
with respect to this type of assessment. 

j^TICIPATED USE ; the user has decided to stfrt using this 
type of assessment, but has not yet acted upon that decision. 

PREPARATION TO USE ; The user is preparing to _^se (study tng, 
taicTng action to begin using) this type of assessment but 
not yet doing so. 

EFFORTFUL USS : The user is using that test type, but that use 
is labored, requiring much effort. 

cbMFQRTAB£E USE ; the user is using this type of assessment 
with ease. 

REFINING USE ; The user is making changes in assesstnent 
procedures to increase outcomes^ and is working alone on this* 

CQI^LABQRATIQN IN USING ; The user is makih^^ deliberate efforts 
to coordinate with others in developing and using this^type of 
assessment • 



0861e 



8 

Ji 



Scaling on the teachers' level of use was accc?mplished by having the 
respondent answer a btanchihg series of questions about their use of each 
test type. 

Types of Concern , We also investigated teachers' cbhcerhs about each 
individual type of test/ by adapting the **l^vels of cbhcerh model" 
developed at the University of Texas Research arid Develbprrierit Center 
(Hall et al., 1977) • This model helps uhc?6ver teachers' perceptions of 
their own assessment needs by asking teachets to identify their primary 
concern (e.g., lack of information, management issues) about each type of 
classroom assessment. Possible concerns about teacher-made objective 
tests/ published tests/ and performance assessment (structured arid 
spontaneous) as l>isted below. Each teacher was asked to identify his or 
her primary concern by selecting from among tb^se statements. 

Teachers cbricerned about-;- Select i 
Lack of information 



Competence 



Time management issues 



Consequences of use 



Collaboration in using 



I am concerned about my lack of 
ihfbtm^ t^Q^ about developing and using 
(my bwil objective paper and pencil 
tests.) 

I am cbrice^ried about my lev el— of 

trainiiig :» -^4JLl ^and experien ce in 

developing and using (my own objective 
paper ^ri^ pencil tests.) 

I am cbncerhed about the amounts of time 
recfuirad: to manage the development and 
use bf t^sts% 

1 am cbhcejThed about how my ^^odents 
react wH^n I administer (my own 
bbj^ctiv^ tests.) 

I am cbncerried about establish iog^ 
wbr k in^ relat ibhshlps^ with other 
teac'neiTs to <3evelbp and use (objective 
tests . ) 



d861e 



9 

12 



Test improvement I am concerned aboat making such tests 

better and asing them more effectively * 

Teachers who had no priiflaty concern were asked to leave the item blank 

Teachers' concerhs iti9icate the type of information about testing 
that is likely to c>£ greatest interest and use to teachers at any 
given point in time, fvt eseample, if a teacher is concerned about the 
adequacy of her/his tr^ihin^ and skill in assessment , that teacher is 
unlikely to be interested in strategies for working with other teachers 
to improve testing, ^th^f/ the competence concern mast be 
satisfactorily addtre^sea first. Tb assist us in interpreting concerns 
more accurately r teache?^ were also requested tb cite the specific 
reason (s) why the respon^^ they selected was primary for them. 

jer^formance ftsse_^Sitjeht > The remaining questionnaire items focused 
specifically on structar^4 performance assessment. These questions were 
asked in two forms. Firs^r teachers were asked to give an example of a 
structured performance t^st used previously. They were then asked to 
further describe that es^^ntpie by answering a series of questions about 
its develo^ent, administration r scoring, use and quality. These initial 
sample questions wer^ designed to ensure that teachers understood the 
characteristics of p^rfotmance tests as distinct from other 
teacher -developed test^. After describing the example r teachers were 
then asked to answer a parallel set of questions about their general use 
of structured performaiic?^ tests. These latter questions {listed in Table 
4) provided the specific information which was analyzed in order tb 
understand teachers' Q^e of performance assessments in each subject area 
and grade level • 



0861e 

10 




13 



RESULTS 

Results are sUntihar ized in several parts. First, we report teachers' 
patterns of test use in terms of the levels of test use scale and iXte 
relative weight teachers assigned to different test types for different 
purposes. The analysis then turns to teachers' cdncernl about 
assessment. Cbricerns of respondents are suiranarized in terms of (1) the 
types of concern^ and {2) teachers' stated reasons for those con<?etn^. 
The third part of the analysis addresses teachers' use of structut^^ 
performance assessments ^ describing test characteristics and qqaJity 
control procedures. in all three cases, data are explored actors t^^t 
type (teacher-made objective, published p structured performance 
assessment, and spbritarieous performance assessment), grade level (2^ St 
8, and 11), and subject area (writing^ speaking, science, and math) - 

The overall joal of the analysis is to describe the classrQ<5m 
assessment practices—use, preferences, attitudes, and role of 
performance assessment — of these 228 volunteer teachers. Since th^^e 
teachers may not be representative of the general teacher population and 
since the practices described reflect what teachers say they dc--riCkt: 
necessarily what they actaaiiy do — inferences about the testing pirac^tices 
of ail teachers are not justified. 

In all cases, we have attempted to select arid discuss the largest, 
most notable patterns of difference in teachers' riEfspbrises as tney i/^tied 
across test type, grade and subject. Due to the exploratory riatuire of 
the study, limitations in the characteristics of the sample of 



0861e 



11 

14 



respondents^ and number arid complexity of th^ questions asked, questions 
of the probability of bccurrerice of patticulsir differences were not 
addressed via statistical arialysis* 

Patterns of Test Use 

levels of Use . Table 1 reports the pe^c^iitage of respondents at each 
category on the level of use scale. 

Looking first at teacher -made ^fejeojb4<ggr^^g^ y about half of these 
teachers report comfortable use. This hol6^ across grades and subjects, 
the other half of the teachers vary iri li^sr^l of use. For instance, use 
of teacher-made objective tests tends to increase steadily as grade 
increases (i.e., nonuse percent declines); but teachers may struggle some 
to increase use of this type of test as indicated by the incrsase in the 
effortful use category. Further, math ktxd Science teachers terid to use 
their own objective tests slightly mori tnati writing arid speakirig 
teachers • 

Note also that (aj about 2Q% of ress>brid^hts claim that they do riot 
use their own objective tests, (bj few teachers anticipate use of this 
test type, (c) few are preparing for futui:^ U5e, arid (d) collaboration in 
use of teacher-developed objective tests is very low. Points b, c and d 
remain constant for ail test types ^ grad^^ ^nd subjects. 

Regarding published tests , again, • trly Half report that they use 
these tests with relative ease, with most of the others repbrtirig that 
they do riot use them at all. There appears to be slightly more use iri 
early grades and appreciably more use in m^th relative to other 
subjects. Here again there is no preparation for change arid rib 
collaboration • 



0861e 



" Id 



The levels of use for performaiice assessment — strudtu^eci ^nd 
spbritariebus—differ from the objective tests. Eighty-five petrc^ht of 
these teachers report some use of structured performance t^st^. 
Forty-eight percent report comfortable use, with anothet cju^r^ec refining 
their use of these assessments, and 15% of teachers also ('^.pctt effortful 
use. Nearly 95% of respondents report use of spontaneous p^rfotntance 
assessments, with nearly 80% reporting comfortable use. All of these 
patterns seem relatively constant across grades arid subjec^t^^ 

a&le of Test Type as a Function of Purpose , Patteirtis iTeliance on 
test types vary slightly as testing purpose chariges. TaWe 2 sununarizes 
the relative importance teachers assigned to the various te^t types for 
diagnosing the strengths and weaknesses of individual stuaeht^, grouping 
for instruction, assigning grades, evaluating the ef fect:i\rehe^5 of ari 
iristructidrial treatment and reporting results to parent^. 5ii1c?e teachers 
assigned higher percentages to the methods that contribute nid^t to each 
decisidri, these data are hereafter 5aiied "reliance perceilt^g^s'' in 
describing and interpreting the results. The higher th^ iTeli^rice 
percentage, the more weight given to a type of test for that purpose. 

For diagnosis, teacher-developed objective tests ac^ i^pott^d to be 
given most weight, with both types of performance assessment (?lose 
behirid. Published tests play a secondary role. Pattern^ v^ry across 
grades. Teacher-made objective tests appear somewhat more important in 
later grades, while published tests seem somewhat less Sb^ Structured 
performance assessmerit is given more importance in diaghb^in^ in grade 11 
than in lower grades, while spontaneous performance ass^s^tn^ht^ is 
reported to be least important at grade 11. Across school Subjects, 
teacher-made objective tests appear most important for cji^griosirig iri 



0861e 

" 38 




TABLE i 

tfiVEL OP USE BY 'TEST TYPEj 

GRADE ANt) Subject (in percent of respondents) 







Grade 






Subject* 




Total 




- 2 - 


5 


8 


11 


WR 


SP 


se 


m 


Sample 




$7 


58 


58 


55 


58 


61 


50 


S9 


228 




.teache r -jnade 


Objective 


Tests 






Nonuse 


32% 


26 


15 


9 


26 


29 


12 


14 


21 


Anticipated use 






2 




2 








.4 


Preparation to use 


2 






2 




2 


2 


• 


1 


Effortful use 


5 


10 


15 


25 


li 


21 


8 


14 


14 


Comfortable use 


53 


45 


53 


47 


47 


40 


61 


n 


49 


Refinement 


9 


19 


15 


15 


14 


9 


14 


20 


14 


Collaboration 






2 


2 






2 




1 








Published 


Tests 










Nbnuse 


30 


25 


40 


44 


34 


54 


34 


16 


35 


Anticipated use 




4 


2 


2 


2 




6 




2 


Preparation to u^e 


4 




3 


7 


5 


3 


4 


2 


4 


Effortful Use 


9 


7 


7 


4 


4 


3 


4 


15 


7 


Comfortable use 


49 


56 


38 


35 


41 


38 


38 


6i 


45 


Refinement 


9 


9 


9 


6 


14 


2 


id 


7 


8 


Cbllabbratioh 






2 


2 






4 




1 



Konuse 


17 


4 


14 


8 


4 


13 


11 


14 


10 


Anticipated use 




2 


2 






2 




2 


1 


Preparation to us^ 




_ 4 






2 


2 






i 


Effortful use 


11 


14 


16 


17 


18 


10 


23 


9 


is 


comfortable use 


57 


51 


46 


40 


52 


52 


36 


52 


48 


Refinement 


13 


26 


21 


26 


25 


17 


23 


22 


22 


Collabbratidh 


2 




2 


10 






6 


2 


3 



Sf>bhtahebus Performan ce Ass essm erit^ 



Honuse 

Anticipated use 
Preparation to use 
Effortful use 
Cbmfbrtabie use 
Refinement 
Cbllabbratidh 









2 






2 




1 


2 


5 


2 


4 


7 


3 


2 




3 


84 


83 


82 


66 


77 


85 


72 


79 


79 


ii 


9 


17 


17 


14 


8 


17 


3^4 


13 


2 


2 




2 






2 


4 


1 



*WR stands for Wcitih^^ for Speaking, SC for Science^ MA for Mathematics 



0861e 



14 



if 



science and math> while 5t?uc:tured perfdrfflance assessment is most 
important in writing a^g^S^tn^ht, while spontaneous performance assessment 
is given most weight in sjieaking diagnosis. 

When forming instry^tic^^n^l groups^ oh the average, these teachers 
give approximately equal v^^ight to all four types of tests. However, 
examination of grade ahS subject differences reveals some notable 
variations. For instan^^* as grade increases, the importance of 
published tests and spdntahebU^ perforinahce assessment decreases, while 
weight gt\7en to structut^<3 porfbrmance assessment arid teacher-made 
objective tests incraa^^^> Alsor for gro»miKg for diAgnosinrt) . m^th 

and science teachers tend to rely on their own objective tests, while 
writing teachers give most weight to structured performance assessment 
and speaking teachers jrely rftost heavily on spontaneous performance 
assessment. 

When assigning graded > teacher-made objective tests stand but as most 
inipbrtant, followed by j^ttuctured performance assessment. Published 
tests and spontaneous pet^ottnance assessment play lesser roles. Within 
this pattern, however, th^ie are clear trends across grades. As grade 
level increases, the weight given to objective tests and structured 
performance assessment up, while that given to published tests and 

spontaneous performance aa^cssinent goes down. Across school subjects, 
once again, math and science teachers give most credence to their own 
objective tests, while writing tests rely most on structured performance 
tests. 

In order to evaluate t>.e effectiveness of ah instructional tre£*tment, 
these teachers tend to us^ their own objective tests* followed by 
structured ahd/dr spdnt^ii^ous performance assessments. Published tests 



0861e 



18 



TABLE _2_ _ ___ __ 

BbtE OF TEST TYPE AS A FUKIC^ION OF PURPOSE FOR 
ASSESSMENTS REPORTED G^WPE AND SUBJECT 
(in reliance perrc^ritages) 



N 



Grade 



57 



58 58 



4i 

55 



5S 



SP SC MA 



61 



50 59 



Total 
Sample 
228 



OBJ* 


25% 


27 


33 


37 


24 


24 


41 


34 


31 


PUB 


29 


25 


12 


13 


19 


14 


15 


21 


17 


ST PA 


24 


23 


27 


35 


3'> 


26 


23 


22 


27 


SP PA 


32 


26 


23 


15 


20 


35 


21 


24 


25 



OBJ 


25 


27 


32 


32 


20 


24 


36 


38 


29 


j?uB 


23 


32 




13 


25 


21 




30 


25 


ST PA 


18 


22 


23 


32 


34 


21 


70 


19 


24 


SP PA 


28 


19 


24 


14 


la 


33 


24 


12 


22 



Sva lusting 





OBJ 


29 


36 


43 


4B 


34 


33 


46 


44 


39 




PUB 


1.9 


22 


8 


9 


14 


11 


12 


20 


15 




ST PA 


23 


22 


28 


34 


36 


27 


24 


20 


27 




SP PA 


28 


20 


17 


10 


16 


24 


18 


16 


19 




OBJ 


30 


35 


36 


39 


31 


33 


44 


35 


35 


PUB 


19 


24 


12 


14 


la 


11 


15 


25 


17 




St PA 


21 


22 


32 


2? 


36 


28 


19 


20 


26 




SP PA 


3D 


20 


19 


18 


IS 


29 


22 


20 


22 




OBJ 


29 


30 


38 


4« 


29 


30 


45 


38 


35 


PUB 


22 


29 


14 


10 


20 


14 


17 


26 


19 




ST PA 


25 


23 


30 


31 


3S 


28 


22 


23 


27 




SP PA 


26 


18 


18 


14 


17 


28 


18 


13 


19 



* OBJ stands for teacher-made objective te^ts» PUB for pablished tests > ST PA 
toj: structured p^^rfdrmance assessment and PA for spontaneous performance 
as^es^inent • 



08610 



16 

19 



are secondary. ReliahOB on objective tests increases with gtaae* as does 

reliance on istructtmr^d performance assessments Reliance on published 

tests fluctuates witu graide^ while the weight given to spontaneous 

per f brmance assessinent dropis after grade 2. ftcross subjects ^ science and 

math teachers evaluate inost heavily based on their own objective tests > 

while structured peirf btmsihc:i assessment is more important in writing and 

speaking. 

And finally, wh^n tHe purpose for assessment is reporting achievement 
results to parents* t^aicbers rely most heavily on their own objective 
tests arid structuted |>^j:fbrmahce assessment. Many of the seime grade and 
school subject pattei^H^ referenced above appear here alsoi Objective and 
structured per formaific?^ tests increase in importance as grade increases, 
while published ind spontaneous performance assessment decir^aiSe in 
imEXDrtance. Thus, iflath and science teachers weight their bW objective 
tests most heavily^ While writirig arid speakirig teachers ten<3 to use 
performance assessment* 

Concerns about Asse^sjn^ nt 

Type of Cbricetn ^ ^abl^ 3 reports teachers' types of (Jbhoern about 
different kinds of test^. The perceritage of teachers s<Blec?ting each 
category as her ot His primary coricerri is reported. 

Note that 28% of ^he total sample of teachers registered! no concern 
about teacher-mad^ <?b3^(?tive tests. Thus, nearly ' Mree-^Ualrters 
expressed some primary Concern. By far the trios t common cch<:i^th about 
teacher-made objective tests focused on test imprbvemerit/ r^fl-ecting 
teachers' desire to inip^ove their use of this kirid of test. The other 
conunon concern is m^n^igementr reflecting uneasiness with tb^ amburit of 
time required to manage this mode of assessment in the clas^tobin. These 

b861e 

17 

20 

EKLC 



TABLE 3 

SUtmR^ OP_TYPE OF CONCERia iSOUT ASSESSMENT 
TBST TYPE, GRADE, AOTSU^^ 
(in percent of respondents) 



Cbhcerh Gg^e S ubject Total 

2 5 ^ 8 11 SP SC M A Sample 

tT^^ 58 58 55 58 61 59 59 228 



Teacher-itiade ObjeGt4v^ Tests 



No concern 


46% 


22 


30 


15 


31 


35 


20 


25 


28 


tack of information 


5 








3 


2 






1 


Competence 


2 


2 


2 






2 


2 


2 


1 
19 


Time Management 


XA 


22 


IS 


22 


24 


15 


22 


15 


Consequence 




7 


5 


7 


9 


7 


2 


2 


5 


Collaboration 


4 


3 




7 


3 


2 


2 


7 


4 


XiTtp r ovoTmS 11 1 


20 






» o 




o 


r- ^ 




* 



Published Tests — 



No concern 


4i 


31 


42 


38 


38 


37 


32 


45 


38 


Lack of information 




9 


5 


9 


10 


8 


10 




7 


Competence 




3 






2 






2 


1 


Time Management 


15 


12 


5 


6 


7 


5 


10 


14 


9 


Consequence 


20 


17 


14 


33 


1? 


15 


28 


22 


21 


Collaboration 


2 


3 


4 


4 


3 


2 


2 


5 


3 


Improvement 


20 


24 


30 


11 


21 


33 


18 


12 


21 



Structured Performance Assessmeivt 



No concern 


5b 


33 


35 


24 


29 


44 


25 


42 


35 


Lack of information 


4 




4 




3 


2 


2 




2 


Competence 


4 


4 


7 


2 


3 




10 


4 


4 


Time Management 


20 


21 


16 


26 


21 


12 


27 


25 


21 


Consequence 


t 


11 


9 


6 


7 


10 


10 




7 


Cbllabbratibn 




4 


7 


6 


5 


3 


2 


5 


4 


Improvement 


20 


28 


23 


38 


31 


29 


25 


25 


27 



:aneous Performance Assessment 



No concern 


S9 


48 


39 


39 


41 


58 


39 


46 


46 


Lack of information 


2 


2 


2 


6 


3 


2 


2 


4 


3 


Competence 


i 


5 


7 


6 


5 


3 


10 


2 


5 


Time Management 


9 


id 


2 


4 


10 


3 


6 


5 


6 


Consequence 


t 


7 


5 


9 


5 


5 


6 


7 


6 


Cbllaboiation 






2 


7 


2 


2 


4 


4 


3 


Improvement 


24 


28 


44 


30 


33 


27 


33 


33 


31 



0861e 



18 



21 



teachers do not tend to be concerned about a l-ack of infbrihatibri about 
these tests ^ their competence in using them^ the student reactions to 
their use, or collaborating with others in using them. These patterns of 
concern vary with grade ai«d slightly with subject. For example, about 
half of the second grade respondents expressed some concern, while 85% of 
the eleventh grade teachers did so. Th^re is an increasing concern about 
quality and management of teacher-made objective tests as grade 
increases, and for math and science teachers in contrast to writing and 
speak ihg teachers . 

Fewer teachers expressed specific concerns about published tests. 
About 40% of the total sample expressed no concern. Of those expressing 
some concern, most were uneasy about (1) student reactions and (2) test 
improvement. More eleventh grade teachers seem concerned about 
consequences than teachers at other grades. Beyond this, response 
patterns were generally stable across grades and subjects. 

Expressions of concern about structured performance assessments were 
siiiiilar to those for teacher-made objective tests: improving quality and 
time management were most crucial. Sbnie grade level trends appear, with 
indications that concern for iihprovihg such assessments and using them 
more effectively increases with grade level. 

Spontaneous perfbrmahce assessments elicit the fewest expressions of 
concern, with only half of the respondents reporting some concern. Most 
of these were concerned with improvement of the assessments. Again, the 
frequency of this concern seemed to gradually increase with grade level. 

^^^sons for Concern . After teachers indicated their primary concern, 
they were also asked to speci/y why that concern was primary for them. 



Qd61e 



19 



We two most coiranon types of concerns meritibried about teacher-made 
Objective tests were improving test quality arid tiitie management. The 
reasori tot the teachers' concern about time required to develop and use 
their cvn tests is that it interferes with instruotiop^l time. Teachers 
who iri6it::ated uneasiness about the objective tests they developed and 
use<a g>b^ed such questions as: Are my tests effectiveV How can I make 
tbem better? Do they focus on students' real skills? Are they 
cball^h^ihg enough? Do they aid in learning? 

ODHe two most frequent concerns about published tests related to 
stuc^erit^' react ions and improving the quality of t^st use. Those 
conceirn^d about student reactions to published tests tended to view these 
tests as invalid, undependable , too long, etc. and thus anticipated that 
the t^sts wc e not helpful to students. Those concerned about improving 
test us^ s^e published tests as time-consuming , not matching thsir 
instructibnr failing to reflect true student characteristics and 
generally not ineetihg important instructional needs such as identifying 
material to teach or reteach. For these reasons, they would like the 
tssts r^^i^ed and improved or would like to learn to use them more 
effectively. Published tests generated the most negative cbmments i n 
respondents ■ expression of concerns. Many teachers see them as 
intejrJering with instruction, 

Cc^noerns about perforinahce assessment — structured and 
spontaneous— <aealt primarily with the desire to improve both the 
asse£5^meht and its use. Teachers* test quality concerns focused on 
^ccuc^cv of assessmentr difficulty in defining levels of performance and 
the n^ecS to be objective. Test use issues reflected a desire to measure 



QSSXe 



20 23 



growth r to challencje (but riot iritimidatie) students, and to provide 
diagnostic iriformsitiori . 5ome were also cbricerried about th^ time demands 
of using per f brmaricJe assessments. 

Classroom Pe^^fbrin^rK: ^^^ 

severity-eight per;-Terit of the teachers completing the questionnaire 
reported using stiructured performance assessments in theit ci^isscooms; 
Those 177 teacher^ responded to a series of questions which described 
their assessmerits* iiesults ere preserited by grade arid subject in Table 4 

Responses to iten 1 iri table 4 describe teachers' quality control 
procerlnres. Teacher : we^re asked to iridicate the percerit of. their 
performance asessmentij in which they include various proc^dur^s, On the 
averager teachers do the following in the majority of their e5Sessmeritt> : 

(Part A) specify a reason in their mirid for assessm^rit J^xribr to 



(Part C) inform students of their scbririg criteria 

(Part D) pl^n scoring procedures iri advance 

(Part E) define levels of performance assessment 

On the other Hand, less than halt of the assessmerit^ include (B) 
written performance assessment criteria cr (G) multiple pej:forr*^arice 
observatibriG before *naking a judgment, ftnd finally^ teachers ^seldom (F) 
rated performance without knowledge of tt > students' identity* or (H) 
crbss-cftecKed judgments about performance with other test 5c6t:/?s . 

There are some differences in responses across grades. instance r 

as grade iricreases, 5b does the tendency to write down criteria ^nd 
inform students of them, plan scoring procedures r define iev^i^ of 



testing 



0861e 




2i 



DESCRIPTION OF PERFORHASCE ASSESSHm BY GWPE aND !5UWECr 



L. In what percentage of «11 your STRUCTURED PERFORMANCE ASSESSHENTS do yOQ 

s; specify, the . reason forjssessnent in your own ilnd prior to 
conducting that assessmeni:? 

B. Write down scoring criteria before assessnent? 

C. inforn students of scoring criferii before assessraent? 

D. Plan actual scoring or rating procedures before aasessraeht? 

E. Clearly define levels of perfoniaiice from adepte to 
inadequate before rating performance? 

P. Conduct "blind" ratings of student products (i.e., rate 
perfonance without isowledge of who the respondent is) 7 

G. Observe and rate perfornance more than once before making 
a judgment? 

H. Checlc your judgments against objective or published test 
scores before lialtirig a final decision? 

Hhat percentage of all of your STRUCTURED PERFORHWJd TESTS 
involve the evaluation of 



2 


5 


- 1- 




BRIT 


SP 


sa 


HATH 


TOTAL - 


38 


51 


41 


47 


46 


46 


38 


47 


177 


79 


82 


88 


85 


87 


86 


76 


84 


83 


28 


41 


63 


58 


48 


61 


34 


46 


48 


35 


62 


73 


78 


66 


73 


57 


55 


63 


57 


61 


75 


75 


70 


73 


57 


68 


67 


48 


58 


76 


69 


66 


69 


56 


60 


63 


8 


14 


28 


23 


Id 


12 


32 


23 


18 


51 


42 


47 


41 


44 ' 


43 


42 


50 


45 


21 


21 


22 


18 


12 


13 


19 


38 


21 



52 49 55 55 32 67 56 57 53 



Students doing things (behavior)? U - ^ 47 

products created by students? 



er|c 25 



26 



: 4 (continued) 



\3 you observe and rate performance, with what percentage of i^'^x 
issessoents do you use the following procedures to record your 
ludgments? 

Checklists (tlat of sKtlls present or absent) 



I. Rating Scales (continuum from good to poor quality perfoxsivic^j 

Anecdotal Records (written descriptions of perfbrmaxlce) 
). A Grade (in a record book) 

;. Mental Notes (accumulated in memory over time) 

lhat proportion of all of your STRUCTURED PERFORMANCE ASSESsHgWS 
lo you score _ _ _ _ 

Hdlistically— scoring overall proficiency? 

Anaiyticaliy-'-scoring specific subskills? 

Both holistically and analytically? 

fhat proportion of your STRUCTUpD PERFORMA^ 

inducted without students being aware that your a/re assessing t^a^ai 

hen rating students, do you always dp the rating pr do colle»ga§s 
r the studentB themselves play a role? Indicate the appropriate 
ercahtage of ratings conducted by each potential rater listed P^loV, 



A. I (the teacher) do the rating 

S. Colleague rates student performance 

C. Students rate each other ^s performance 

D. Students rate their own performance 

hat proportion of your STRUCTURED PERFORMANCE MSESSHENT rwult^; 
s interpreted primarily by comparing student performance to 



hat of other students (norm referenced interijretation)? 

[locific preset standards of criteria of niihiniuin acceptable 
I'l formance (criterion referenced interpretation)? 



2 5 8 ,11 WRIT _ _ aP: : i5Cli : MATH YOTAL 



31 


30 


35 


35 


35 


4o 


24 


21 


33 


2B 


33 


45 


42 


34 




35 


39 


37 


23 


23 


35 


33 


26 




20 


25 


28 


38 


63 


85 


86 


71 


65 


66 


70 


68 


46 


37 


5b 


28 


39 




18 . 


33 


40 


26 


28 


29 


22 


27 


is 


^2 


3l 


26 


18 


21 


IS 


20 


20 


12 




23 


19 


54 


51 


5S 


60 


50 


75 




44 


55 


40 


25 


i3 


13 


17 


;52 


56 


23 


22 


90 


84 


82 


9b 


86 






85 


87 


6 


2 


2 


.4 


_1 


_5 




4 


4 


5 


11 


19 


14 


14 


J9 


? 


9 


12 


X3 


i9 , 


16 


id 


15 


IB 




13 


15 


38 


34 


25 


32 


27 


27 




35 


32 


52 


64 


75 


69 


71 


n 


lo 


64 


67 



2S 



performance, and ^Qfidiict blind ratings. Differences across subjects are 
iess pronounced, but^ generally su§g%st that quality cbritrbl activitiies do 
vary somewhat on this dimension also. For instance, teachers dealing 
with speaking ^^^^s^S^at appear more likely to write down scoring 
criteria than oiU^i^ ^nd are more likely to inform students of them than 
are math and sci^tice teachers. Further > it appears that science teachers 
are somewhat le^s Ij-k^ty to plan scoring procedures in advance of the 
assessment than 5te the others^ .4ath and science teachers use blind 
scbririg more ffgqqetitiy than their writing and speaking counterparts. 
And finally, teachets appear more likiiy to check their judgments against 
test scores when ^e^litig with math in contrast to other subjects. 

In the rem^inin^ items in Table 4, teachers further described 
character istic^: t^h^ir structured performance assessments. Teachers 
reported that tb^Se ^ssessitients tended to be: equally divided between 
evaluations of ptoc^ss and product (item 2); recorded most frequently as 
a grade in the r^Oor^S t>6bk, and less frequently as mental notes, rating 
scales, checklists ^hd anecdotal records (3); scored both holistically 
and analytically {A); conducted with the awareness of the student (5)? 
based oh teacher^ • judgments, with students rarely playing a role in self 
or peer assessineht J6) ; and criterion referenced or based on 
pre-estabiishea ^ti^ndards of acceptable performance (7) • 

The data reportid in Table 4 reveal some notable differences in test 
characteristic^ ^oro^s grades and subjects. For instance, as grade 
increases, so doo^ reliance on rating scales arid grades. However, the 
use of unobtrusive assessment (5) decreases as grade increases, 
ecmparihg subject^ ^ Writing assessment is most frequently based on 



product evaluation (presumably wtitihS samples), while others are more 
process oriented. Speaking asse^Sifleht^ use slightly more checklists and 
fating scales than others, while S^^i^nce assessors rely heavily on mental 
record keeping. Speaking assessments tend to be scored more completely 
(holistically and analytically) th^n others. All other characteristics 
^re quite constant across subjects> 

Rel ating Relianc e-^er^ent aqga^ct Qgs Parpbseb > A correlatibrial 
Analysis was conducted to explore Ch^ question of whether teachers who 
rely heavily on one assessment ptO<?e^Qre for one purpose tend to rely 
Heavily on that same procedure for other purposes. Essentially^ this is 
a fbllow-up analysis to the results presented in Table 2, where we saw 
that the weight given to any pattimlAt: test type tended to vary only 
Slightly across purposes. To vetify this prior conclusion, we would need 
to find high correlations between Weights assigned for the same test for 
different purposes. The results aj^e presented in Table 5, 

Since ail 40 correlatibhs *re|>^r^t^d are consistently quite large, a 
teacher's reliance on a particular assessment method appears to be 
somewhat stable across differing JiUrJ>CkSes. The average correlation 
between reliance indicators across jylarpbses for teacher-made objective 
tests is .65 as it is for structured performance assessments. For the 
other two test types, the mean cOfiTelations were somewhat lower: .51 for 
published tests and .55 for spbhtarieous performance assessments. There 
are other notable patterns across if^a trices. For instance, in all four 
cases, the lowest correlation (avajTage .44) is between the weight given 
to an assessment procedure for evaluating instruction and the weight 
given to that same procedure for iristrUctional grouping. Also, in all 



TftBtE 5 

CORReW^T?I0NS fiHONG IMPORTANCE RMINGS OF SAME TEST TYPE 

USED FOR DIFFERENT PURPOSES 



Poblished Tests 



(1) 


Diagnosis 


(1) 


(2) 


(3) 


(4) 


m 


(2) 


(3) 


(4) 


(2) 


Grouping 


.61 
















(3) 


Graclih§ 


.76 


-59 






.$4 


.57 






(4) 


Evalueiting 


.60 


.44 


.71 






.46 


.67 




(5) 


Reporting 


.74 


.57 


.81 


.69 


.54 


.63 


.67 


.70 




V> V V U V*' U I. CO 




fia a e a STiie i i I. 




tvcaneoas 




(i) 


Diagnosis 


(1) 


(2) 


(3) 


(4) 


(1) 


(2) 


(3) 


(4) 


(2) 


Grbopth^ 


.63 
















(3) 


Grading 


.82 


.56 






.62 


.45 






(4) 




.64 


.45 


.72 




.48 


.42 


.48 




(5) 


Reporting 


.71 


.56 


.74 


.71 


.59 


.56 


.72 


.53 



f8a^ cases, the highest cbrrelatidns are found BitWiiH the wei§ht giv 
when reEXDrting achievement to parents and the weight given in grading 
(averaging .73), iXrid between the weight given in grading and that giv 
in diagnosing stvident strengths and weaKnesses (.68). 



d86le 



27 SB 



DISCUSSION ANT) CONCLUSIONS 

From these results , we have selected five major issues for further 
analysis and discussion. These Issues capture what we feel are the flto^t 
important insights about classroom assessment to be derived from th^ 
data- In this section, we draw conclusions about (1) the use iind 
importance of performance assessment in the classroom; {2) the stsbilitV 
of results across grades, subjects, and research contexts; (3) teachers' 
concerns about assessment, particularly with respect to improving b^st 
quality arid use; (4) specific issues of assisssinent quality, including 
potential difficulties in classroom performance assessment procedures; 
and (5) actions needed to overcome some of the assessment prcblem^. 

The Nature and Role of Perfbrmaric e Assessment in t he-Class room 

Our previous studies (Stiggins and Bridgeford, 1982) led to the 
cbnciusibn that perfbrmarice assessment — the observation and rating of 
student behavior and/or products — plays a key role in the day-to-day 
measurement of student achievement in the classroom. This study 
reinforces that coriclusibri. A large majority (177 of 228) of the 
teachers in this study report usirtg structured performance assessment in 
the classroom. More impbrtarit, the weights assigned to structured and 
spontaneous performarice assessmerits shbw them to be heavily used mo<3e^ of 
assessment in all five decisibri cbritexts explored- This appears to 
true across the grades arid schbbl subjects examined. Our data indicate 
that performance assessment arid teacher-made objective tests form the 
basis of most classroom assessment. Published tests play a secondsty 
roiei Teachers, moreover* have cbrisiderable confidence in their ability 
to make accurate observations and professional judgmerits; they exptQ^j^ 



0861e 




comfort with perfbrmarice assessme^nt , and rety on it as a key method of 
judging students' learning. But the data aiso ino^:cate that this 
confidence should hot be cdhfuised with complacency, fts we have seen, 
many teachers are sensitive to the fact that there may be problems in 
their assessments r are concerned about improving test quality and want to 
find ways to improve test use- 

What are classroom perfbrmarice assessments like? In one sense r they 
vary greatly across teachers and in ahbther sense they are quite 
similar- The specific ingredients of the tests vary across subjects and 
to a certain extent across grades. Exercises, performance criteriar and 
student responses obviously vary as a fuhctibh of school subject. 
However r the form of the assessment remains constant. Teachers evaluate 
both behaviors and products in approximately equal proportions. They 
tend to use prespecified standards (rather than student comparisons), to 
record assessment results with a grade in a record book, and not to 
involve students in performance ratings. Thbugh mbst teachers know in 
advance why they are assessing--a key to quality assessment — some may 
fail to apply other quality control procedures to their performance 
assessments. We will explore this point in greater detail belbw. 

Examining connnbn characteristics of these assessments leads to the 
conclusion that performance assessment may hot be used as effectively as 
is possible. For instance , students represent ah untapped reservoir of 
performance raters, especially when teacher time is at a premium- 
gtudents can successfully rate their own and one another's perfbrmahce 
3nd can learn a great deal from doing so (Spandel, 1981). Pbr ahbther 
example, recording systems other than grades often provide valuable and 
rich feedback to students. Checklists, rating scales, and anecdotal 



0861e 



29 34 



records r foe e>campie, offer the detail often needed to describe 
performance and make careful assessments. The He^vy -^tiance on grades 
seen in the ^3ta suggest that these alternatives ^re not being used to 
advantage; 

Thus, r^saits frbin this study confirm that pecforitiance assessment is 
an important j^^sessment tool for teachers in the classroom. Results also 
indicate th^t the use of this assessment method could be enhanced and 
expanded » 

Stabilit^^^ ag4 ^h2tnge in Assessment Procedures 

Within the pattern of relatively constant assessment methods, 
however, tbece are a few variations worthy of note, in this section, we 
explore the implications of those variations across^ grade, sub3ect and 
test type. 

We fbun<3 three interesting changes in assessment procedures as grade 
increases, First, the higher the grade level, the greater the tendency 
fbr teachers to report using their own assessments rather than published 
tests. Secoha, teachers' concern about assessment increases with grade 
level. And third, teacher's attention to quality odntrbl issues with 
perf5rmance as.-^essments increases slightly with grade level. Levels of 
uii of perfocrnanbe assessment as well as specific attributes of those 
assessmenti vary somewhat across grades. Thus, grade level appears to be 
an important variable in understanding classroom tt^sessment . Elementary, 
junior high and high school ehvirdriments differ in fundamental ways. The 
increased use of teacher-developed tests at higher c^rade levels might 
reflect the teacher's need to tailor tests to cove^ Unique classroom 
objectives at higher levels. The reason for increased concern about 



b86ie 



assessment across grade ieveis inay relate to the increased importance 
placed on grades as a measure of student progress atid success as grade 
increases, ftnd increased attention to qaaiity control may reflect the 
increased concern with accurately judging and grading students: clearly 
grades take oh more importance as students advance in the school systeiUir 
and can influence future decisions of students. Wese and other 
speculations deserve further consideration in future research. 

Assessment procedures also differ as a function o? school subject. 
This is to be expected and bur data support this notion. Math and 
science teachers tend to rely more heavily on paper-and--pencii tests than 
do writing and speaking teachers. Speaking and writing teachers tend to 
use more performance assessments and the performance assessments they use 
tend to differ somewhat from those used by math and science teachers. 
Regardless* concerns about improving test quality and ase tend to remain 
quite constant across subject. 

We can also draw some conclusions , based on the data» about 
variations in assessment approach among teachers and for a given 
teacher. For instance, we have evidence that these teachers are 
relatively consistent in the assessment methods they use. They do not 
vary their testing methods very much as the purpose for assessment 
varies- This finding calls into question our conclusion in earlier 
studies that performance tests are instructional tools while objective 
tests are grading tools JStiggins and Bridgeford, 1982). Both tests 
appear to play a role in both purposes. As these teachers described 
their levels of use, only a handful of the 228 teachers reported that 
they anticipated using or were preparing to use a H^w type of assessment 



0861e 

31 OO 



EKLC 



in the future. These teachers are not exploring new assessment 
approaches. This conclusion has iinpiicatibhs for this action plans 
outlined below; 

Teache xs* Concerns Ab out ftssessment 

At least three--quarters of the 228 teachers queried in this study 
expressed some concern about the assessments they used. Further, over 
half of the respondents indicated concern about each of the four 
assessment methods. Even when teachers reported relatively comfortable 
use of a given form of assessment, they were not reluctant to express a 
desire to improve their tests and the manner in which those tests are 
used. Their most frequently expressed concern involved improving the 
quality and use of assessments. A^ded to that, teachers frequently 
reported concern about their ability to effectively integrate assessment 
given the time constraints imposed by the classroom. Overall, teachers' 
responses in this study indicated concern about assessment quality and 
frustration at the lack of time available to deal more adequately with 
the problem. 

But even more paradoxical and potentially troubling is the fact that 
although teachers are obviously concerned and many want to improve, at 
the same time (as cited above) th^se same teachers do not appear to be in 
the process of changing in ways that will improve their assessment 
methods. Clearly, many-- though concerned — appear to lack the 
opportunity, time, means or motivation to revise their assessment 
approaches. We consider this dil^jnma further in addressing needed action 
programs . 



086ie 



32 37 



The Exten t of the Ptoblero 

Obviously^ many teachers wonder about the effectiveness of the 
assessments they are Using. But is there reaiiy reason to be concerned? 
Irifbrmotibh oh this issue from our data is limited but provides some 
insight. From the self-report data on quality control efforts in 
structured perfbrmahce assessmehtSr tieachers' uneasiness may be 
justified. For example r in at least a third of the structured 
performance assessments conducted by these teachers, important assessment 
procedures appear not to be followed: students are not informed of 
performance criteria, scoring procedures are hot planned in advance, and 
levels of performance (adequate to inadequate J are hot defined before 
rating performance- Further, in over half of these assessments on the 
average, scoring criteria are not written dbwn^ judgments are based dh a 
single observation, and perfbrmahce ratings are hot checked against other 
indicators, such as test scores. Finally^ in an average of 40 percent of 
the structured performance assessments^ teachers rely bh mental record 
keeping. Since these practices can contribute significantly to the 
invalidity and/or unreiiability of structured perfbrmahce assessment 
results, there seems to be reason for concern. 

Thus, the data suggest real problems- But caution is heeded in 
interpreting these problems. The statistics presented above can be 
interpreted from a "glass half empty* or **glass half full" t>erspective . 
Pessimists say we have much to do. Optimists say much is already being 
done. Both are right. Many teachers do an excellent job of assessing, 
adhering to key aspects of quality control in the important assessments. 
In our discussions y inter^7iews, and questionnaire responses* we found 
many very creative applications of performance assessment used in the 



0861e 



" 38 



classroom, and there appears to be a strong foundation of good assessment 
present in many ciassrooms. We can build from that. Many teachers are 
riot complacent. We can count on that. ISo, how do we proceed? 

Moving Toward a Sblution^ 

though the extent and depth of this problem is only suggested by 
these data, the problem is obviously significant. To deal with it, we 
propose a sblutiori including four parts: (1) greater sensitivity to 
teachers* needs on the part of the measuremerit cotranunity; (2) more 
qualitative research on classroom assessment practices, (3) collaboration 
among teachers and (4) iriservice training designed to meet teachers' 
needs. We have two key factors in bur favbr as we consider changes. 
First, bur data on concerns suggest that many teachers are aware of the 
need to use assessment nuDre effectively; they want tb improve. Second, 
many teachers are strong assessors. 

How can we use these factors tb advantage? First arid foremost, the 
measurement cbnununity must give greater attention tb the classroom 
assessment needs of teachers. With a few notable exceptibris, as a 
community of educators, we have only a limited uriderstaridirig of the 
classroom assessment erivirbrimerit and teachers' most pressing assessment 
concerns. Evidence of this fact is presented in Table 6. We found that 
teachers rely bn bbth observational assessment arid teacher-made objective 
tests; published tests have cbrisiderably less irifluerice bri teachers. 
Yet, textbooks used in teacher training provide almost rib iristructibri in 
the assessment methods most relevant for classroom use. Even mbre 
important, measurement research (as reported in professional jburrials) 
cbriceritrates on assessment methods that have the least utility for 



0861e 

39 




TABLE 6 

REtftTiVE IMPORTANCE GP TEST TYPE IN THE PROFESSIONAL LITERATURE 
AND IN TERMS OF TEACHERS' NEEDS 



_ J _ _ 2 3 

Emphasis dh: In T e xts In Resea rch For teachers 

Teacher-made objective 

tests 47% 29% 34% 

Published tests 47% 62% i^% 

Performance Assessment 6% 9% 47% 



Approximate percent of text pages on test construction and use in six 
ihtrbductory measurement test books: Ahman & Slock, i971; Brbwn^ 1970; 
Ebel, 1979; Grdnlund, 1981; Mehrens and Lehmann, 1973; Noil, et ai,, 1979. 

Approximate percent of articles dealing with those tests and test 
development in volumes 17, 18, 19 and 20 (1980, 81, 82, 83) of the Journal 
o f Educational Measurement . 

Reliance percentages summarized from Table 2, averaged across purposes and 
combined structure and spbhtahebus performance assessments. 



b861e 



35 40 



teachers' decision making, fts researchers, buf Zoc\s^ rtiust be redirected 
to include assessment methods and quality control issues in the classroom 
ehvirbriment that affect student learning and instPUctibh* 

Second, we need more research on the ciassroojn ^^ses^ment needs of 
teachers. Extensive research on the role and as^ bt st^ahdardized test 
scores in the classroom has certainly played an imj^ottiht role in helping 
us deal with some key assessment problems. But the time has how come to 
move to a new emphasis; namely, understanding th^ role of strategies such 
as teacher observation in classroom assessment, ^Be research reported 
here represents a small but potentially useful step in tSat direction. 
We might also follow the lead of Good and Bropby (3.9'>8) , who have 
provided teachers with systematic strategies tot ^bs^fVing in the 
classroom. 

third, teachers who are competent assessors k(e another vital 
training resource which must be tapped. Results of this study suggest 
that teachers who rely most heavily oh perfdrinahce ^SSi^ssments tend to 
use such tests somewhat more carefully than those Who use them less. 
Teachers with assessment skill can assist their ee^lle^gues. Previous 
research revealed that teachers regard colleagues a^ ot the two most 
important sources of assessment ideas (Stiggins arid BfWgeford, 1982); 
Yet this study revealed little or ho collaboration 5iJn£>ng teachers in test 
use. These two findings identify a valuable Sdurc^e Q£ ideas that is not 
being tapped. Why? Because there is no time/ encouragement or planning 
to do so. test quality may be readily imprc^'ed by en(::ourag tng and 
promoting cbllabbratibri in asge ssmeht. l_:l-c.<>.::.oc..;.^:_:.-.^^ 



0861e A i 

36 4i 



EKLC 



Greater awat^ftess of the classroom assessment envifdHmeht and its 
demands can tom t^e basi<5 for another important element in our plan of 
action: rele\r§nt training for teachers. 3ased on the testbooks 
examined in Tak?l^ B, current and past training is out oi balance. 
Further f a laj:^^ proportion of teachers have had no measurement training 
at all (StiggiH^ and Bridgefordr 1982; Cof f man , 1983)* Many teacher 
preparation pfd^taiOs (graduate and undergraduate) do hot require 
measurement training and many teachiers avoid it, gtv^n a choice. One 
reason for thi^ ^^didance is that bur training fails b y-^eputation to 
meet important t^ac?her needs. 

As we design and develop training that is more r^l^V^nt to teachers' 
classroom ass^Ssinent needs ^ all available resourced mq^t be tapped. For 
instance, graduate and undergraduate teacher preparation courses continue 
to offer an opportunity for relevant training. PeiThsip^ the student 
teaching expeiTi^noe could be structured to deal directly with classroom 
assessment is^u^S* But inservice training, structur^^a tci meet teachers' 
asseJ5snient heed^ ► provides the greatest opportunity t^i impact. The key 
to success in both settings will not be to present more "strategies to 
interpret standardized test scores." Kellaghan and otHets (1982) have 
sFown these hav^ little impact on teachers' testing practices. Instead^ 
training must focu^ on real teacher heeds and provide guidance in quality 
control for all teacher-made tests, including those ba0e<a on observations 
and subjec' /e judgments. 



086ie 



37 



REFERENCES 



Ahmarih^ J.S. ari^ M.D* GlbcR, Evaluating - Pupil ^cow±h (fourth ec--tibn) . 
Allyri^Bacbn » 197 J. . 



Airasiah, P.^., et ^X* Propbrtibh and direction of teacher rating change 
of pupil pro^r^sP attributable to standardized test informations 
Journal of gd u nat ional Psychology / 1977^ 69(6) r 702-709. 

Brown > F.C3. Pringi pXigjS of Educational and^^s ychdloqical Testing , Dryden, 
1970. 

Burstein^ L. A wbtd about this issue . Jburhal of ^d ucatiQ na4.^ 
Measurements 1983, 20(2), pp. 99-101. 

Calfee^ R.C. and Vsh, Z)runi. How the Researcher Can Help— th e Readin g^ 

Teacher with Clag stoom Assessmeht , Unpublished manuscript , Stahfbrd 
tJniversity, ^St^rn^td^ CAr 1976. 

Coffman, W.E. !^sti£ig_in the Schools; A Historical Perspective . 

A paper presented the Center for the Study of Evaluation Annual 
Invitational Cwfitence^ tJCLA, 1983. 

Ebel, Robert t. S^gggttals of Educational Measurement (third edition) • 
Prentice-HaliT"l979^ 

Fitzpatrick, R. an<i Morrison. Performance arid product evaluatibn. 

In Rit. Thbfn^ike (Ed.) Educational Measurement (second edition) . 
WashingtoHr E>*Q.5 American Council on Educatibri, 1971, 237-270. 

Glaser, R. and fi-J* ?l:ius. Proficiency measu.ameht: Assessing human 

performance. In R»M. Gagne (Ed.) Psychological Princ ipl e s iii System 
X>eve:topment . York, NY: Hblt> Rinehart and Winston, 1962 r 

419-474. 

Good, T.ti and trophy. Looking in Classrbbms (second editibh) . 

New York, NV; B^tper and Rbw» 1978. 

Goslin, D.A. Teachers jtnd Testing . New Ybrk^ NY; Russell Sage 
Foundation, 1967. 

Gronlund, M.E% M^^^utement and Bvaluatibri in Teaching (fourth editiori) . 
McMillan, 198l* 

Gullicksen, A» SOf^g j^X^ata Collected in Survey of South Dakbta Teacher's 
Attitudes and^^pinions toward Testing. University of Sbuth Dakbta, 
1982. 



0861e 



38 43 



Hail, G.E;, A.A. George and W.&i Sdtherford. Measuring Stages of Concern 
about^^e^ innovation: A Majiuj,! , fgr^tJ^ of the SbC Quest iorm^i^ , 
AU^triT^ TX: Research and b^v0].opinent Center for Teacher Education , 
University of Texas, 1977; 

Herman, J. and D.W. bbrr-Bremmi. fi^sessing Students; Teacher's Rbut4^ 
Practices and Reasoning . Paper presented at the annual meetir^ of 
the American Educational Research Association^ New York, 1982. 

Kellaghan, T.r G.F. Madaus and P*W, Airasian. The Effects of Stahda^Mze^ 
Testlag . Boston, MA: Kluw^r-^Si^l^bf f » 1982. 

liindquist, E.F. Preliminary considerations in objective test 

construction, in E.P. tindqui^t (M.) Education al Measurement , 
Washington, D.C.: American Council on Education^ 1951. 

ixsrtie, b. School Teacher . Chtc^^b* IL: University of Chicago Press, 
1975. 

M^nrens, w.s. ana i.d. Lenmann. gg^surement and Evaluatibh in Educa^ioiv 
and Psychology . New York, Wf? R^lt, Kinehart and Winston, 1973. 

Noll, V.H., D.P. Scannell, and Il.<^. Craig. Introduction to Educational 
Measurement^ (fourth edition). Houghton Mifflin^ 1979. 

^udmah, H.E., et al. Xnteg raj: t ng- >sae s sine nt with Instruc tibh; A Rev^iew 
(1922-19B&^. Research Series #7Si East Lansing, MI: College of 
Education, Michigan State Uni^^t^ityr ±980. 

Ryahs, D.G. and N. Fredericksen* P^rfotmance tests of educational 
achievement. In E.F. Lindauigt (gdO Educational Measurement , 
Vfashington, D.C.: American on Education^ 1961^ 455-493. 

SalinonK:px, L. Teachers and stand^dtzed achievement tests: What's 
really happening? Phi bel ta.,^al5e^t^ » May 1982, 631-634. 

Scatesc D.E. Differences betv?e^n w^ai^utement criteria of pure scientists 
and of classroom teachers. Jpu^^n^l of Educational Researc h^ 1943 ir 
37, 1-13. 

Spandel, V. Classroom ApplicattiQn g^^ g_Wr itin^ Assessment ^ Portland OR: 
Northwest Regional Educational liai>oratory, 1981. 

Sproull, li. and D. Zubrow. Stand^r^i^ed testing from the administrative 
perspective. Phi DeH :a^ jCappaxl r May 1982, 628-631. 

Stetz, F. and M. Beck. CommgnJ:^ ,4r jtfn- the Class oom: Teachers^ and 
Students' Opinions^ of Achieyei ^ejyt^j reats. Paper presented at the 
annual meeting_bf the American Educational Research Association, Sah 
Prahciscb, 1979. 



0861e 



Stiggiris, R.J: fev-aloattng Students Thrd u gK m^^c^gg-Ob^irVatiori; 
Watchfpg -Scu ^ents Grow . Washington D.C.s National education 
ASisoeiation^ in presss. 

Stiggins, F.^T. ahd N.J. Bridgefbrd. F4tta3^^eaxch.jR gio c t bn the ^atug e., 
ggaAjpd-Oflaltt y of Classroom Per fbgffiaaee^^esM gjA- Portland, OR: 
Northwest Jlegionai Educational Laboratory, 1902. 



Yeh, J. ggg^ ^e -in the Schools . Lbs fihgeles , CA: Center for 
of Evaluati^,' octft, 1978. 



o 

ERIC 



